PCT 



WORLD INTELLECTUAL PROPERTY ORGANIZATION 
International Bureau 



INTERNATIONAL APPLICATION PUBLISHED UNDER THE PATENT COOPERATION TREATY (PCT) 



1 



(51) International Patent Classification * : 

C12Q 1/68, C12N 9/14, 9/16 



Al 



(11) International Publication Number: 



WO 97/46701 



(43) International Publication Date: 1 1 December 1997 (1 1.1 2.97) 



(21) International Application Number: PCT/US97/08705 

(22) International Filing Date: 20 May 1997 (20.05.97) 



(30) Priority Data: 

08/658,322 
08/803,621 



5 June 1996 (05.06.96) US 
2 1 February 1997 (21 .02.97) US 



(71) Applicant: FOX CHASE CANCER CENTER [US/US]; 7701 

Burholme Avenue, Philadelphia, PA 19111 (US). 

(72) Inventor: YEUNG, Anthony, T.; 217 Walnut Place, Philadel- 

phia, PA 19083 (US). 

(74) Agents: HAGAN, Patrick, J. et al.; Dann, Dorfman, Herrell 
and Skillman, Suite 720, 1601 Market Street, Philadelphia, 
PA 19103-2307 (US). 



(81) Designated States: AU, CA, JP, MX, European patent (AT, 
BE, CH, DE, DK, ES, FI, FR, GB, GR, IE, IT, LU, MC, 
NL, PT, SE). 



Published 

With international search report. 



(54) Title: MISMATCH ENDONUCLEASES AND USES THEREOF IN IDENTIFYING MUTATIONS IN TARGETED POLYNU- 
CLEOTIDE STRANDS 

(57) Abstract 

An endonuclease isolated from celery, CEL I, is disclosed as well as methods for use in detection of mutations in targeted 
polynucleotides. The methods facilitate localization and identification of mutations, mismatches and polymorphisms. The enzyme recognizes 
every type of mismatch regardless of the sequence context in which the mismatch resides and the enzyme is active in pH ranges from acidic 
to basic. 



FOR THE PURPOSES OF INFORMATION ONLY 



Codes used to identify States party to the PCT on the front pages of pamphlets publishing international applications under the PCT. 



AL 


Albania 


ES 


Spam 


AM 


Armenia 


n 


Finland 


AT 


Austria 


FR 


France 


AU 


Australia 


GA 


Gabon 


AZ 


Azerbaijan 


GB 


United Kingdom 


BA 


Bosnia and Herzegovina 


GE 


Georgia 


BB 


Barbados 


GH 


Ghana 


BE 


Belgium 


GN 


Guinea 


BF 


Burkina Faso 


GR 


Greece 


BG 


Bulgaria 


OTJ 


Hungary 


BJ 


Benin 


IE 


Ireland 


BR 


Brazil 


1L 


Israel 


BY 


Belarus 


IS 


Iceland 


CA 


Canada 


IT 


Italy 


CF 


Centra] African Republic 


JP 


Japan 


CG 


Congo 


KE 


Kenya 


CH 


Switzerland 


KG 


Kyrgyzstan 


CI 


Cote d'lvoire 


KP 


Democratic People's 


CM 


Cameroon 




Republic of Korea 


CN 


China 


KR 


Republic of Korea 


CD 


Cuba 


KZ 


Kazakstan 


CZ 


Czech Republic 


LC 


Saint Lucia 


DE 


Germany 


U 


Liechtenstein 


DK 


Denmark 


LK 


Sri Lanka 


EE 


Estonia 


LR 


Liberia 



LS 


Lesotho 


SI 


Stovei i 


LT 


Lithuania 


SK 


Slovar.ii 


LU 


Luxembourg 


SN 


Senegal 


LV 


Latvia 


sz 


Swaziland 


MC 


Monaco 


TD 


Chad 


MD 


Republic of Moldova 


TG 


Togo 


MG 


Madagascar 


TJ 


Tajikistan 


MK 


The former Yugoslav 


TM 


Turkmenistan 




Republic of Macedonia 


TR 


Turkey 


ML 


Mali 


TT 


Trinidad and Tobago 


MN 


Mongolia 


UA 


Ukraine 


MR 


Mauritania 


UG 


Uganda 


MW 


Malawi 


US 


United States of America 


MX 


Mexico 


uz 


Uzbekistan 


NE 


Niger 


VN 


Vict Nam 


NL 


Netherlands 


YU 


Yugoslavia 


NO 


Norway 


ZW 


Zimbabwe 


NZ 


New Zealand 






PL 


Poland 






FT 


Portugal 






RO 


Romania 






RU 


Russian Federation 






SD 


Sudan 






SE 


Sweden 






SG 


Singapore 







WO 97/46701 



PCT/US97/08705 



MISMATCH ENDONUCLEASES AMD USES THEREOF IN IDENTIFYING 
MUTATIONS IN TARGETED POLYNUCLEOTIDE STRANDS 



Pursuant to 35 U.S.C. §202<c), it is hereby 
acknowledged that the U.S. Government has certain 
rights in the invention described herein, which was 
made in part with funds from the National Institutes 
5 of Health, National Cancer Institute . 

This application is a continuation-in-part 
of U.S. Application, Serial No. 08/658,322, filed June 
5, 1996, the entire disclosure of which is 
incorporated by reference herein. 

0 FIELD OF THE INVENTION 

This invention relates to materials and 
methods for the detection of mutations in targeted 
nucleic acids. More specifically, the invention 
provides novel mismatch specific nucleases and methods 
5 of use of the enzyme that facilitate the genetic 
screening of hereditary diseases and cancer. The 
method is also useful for the detection of genetic 
polymorphisms . 

BACKGROUND OF THE INVENTION 

0 Several publications are referenced in this 

application by numerals in parenthesis in order to 
more fully describe the state of the art to which this 
invention pertains. Full citations for these 
references are found at the end of the specification. 

5 The disclosure of each of these publications is 

incorporated reference in the present specification. 

The sequence of nucleotides within a gene 
can be mutationally altered or "mismatched" in any of 
several ways, the most frequent of which being base- 

0 pair substitutions, frame- shift mutations and 

deletions or insertions. These mutations can be 
induced by environmental factors, such as radiation 
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and mutagenic chemicals; errors are also occasionally 
committed by DNA polymerases during replication. Many 
human disease states arise because fidelity of DNA 
replication is not maintained. Cystic fibrosis, 
sickle cell anemia and some cancers are caused by 
single base changes in the DNA resulting in the 
synthesis of aberrant or non- functional proteins. 

The high growth rate of plants and the 
abundance of DNA intercalators in plants suggests an 
enhanced propensity for mismatch and frameshift 
lesions. Plants and fungi are known to possess an 
abundance of single- stranded specific nucleases that 
attack both DNA and RNA (9-14) . Some of these, like 
the Nuclease a of Ustilago maydis, are suggested to 
take part in gene conversion during DNA recombination 
(15,16). Of these nucleases, SI nuclease from 
Aspergillus oryzue (17) , and PI nuclease from 
Penicillium citrinum (18), and Mung Bean Nuclease from 
the sprouts of Vigna radiata (19-22) are the best 
characterized. SI, PI and the Mung Bean Nuclease are 
Zn proteins active mainly near pH 5 . 0 while Nuclease a 
is active at pH 8.0. The single strandedness property 
of DNA lesions appears to have been used by a plant 
enzyme, SP nuclease, for bulky adduct repair. The 
nuclease SP, purified from spinach, is a single- 
stranded DNase, an RNase, and able to incise DNA at 
TC 6 . 4 dimers and cisplatin lesions, all at neutral pH 
(23,24). It is not yet known whether SP can incise 
DNA at mismatches. 
0 In Escherichia coli, lesions of 

base-substitution and unpaired DNA loops are repaired 
by a methylation-directed long patch repair system. 
The proteins in this multienzyme system include MutH, 
MutL and MutS (1, 2) . This system is efficient, but 
5 the C/C lesion and DNA loops larger than 4 nucleotides 
are not repaired. The MutS and MutL proteins are 



i 



WO 97/46701 



PCT/US97/08705 



conserved from bacteria to humans, and appear to be 
able to perform similar repair roles in higher 
organisms. For some of the lesions not well repaired 
by the MutS/MutL system, and for gene conversion where 
5 short-patch repair systems may be more desirable, 

other mismatch repair systems with novel capabilities 
are needed. 

Currently, the most direct method for 
mutational analysis is DNA sequencing, however it is 

10 also the most labor intensive and expensive. It is 
usually not practical to sequence all potentially 
relevant regions of every experimental sample. 
Instead some type of preliminary screening method is 
commonly used to identify and target for sequencing 

15 only those samples that contain mutations. Single 
stranded conformational polymorphism (SSCP) is a 
widely used screening method based on mobility 
differences between single -stranded wild type and 
mutant sequences on native polyacrylamide gels. Other 

20 methods are based on mobility differences in wild 
type/mutant heteroduplexes (compared to control 
homoduplexes) on native gels (heteroduplex analysis) 
or denaturing gels (denaturing gradient gel 
electrophoresis) . While sample preparation is 

25 relatively easy in these assays, very exacting 
conditions for electrophoresis are required to 
generate the often subtle mobility differences that 
form the basis for identifying the targets that 
contain mutations. Another critical parameter is the 

30 size of the target region being screened. In general, 
SSCP is used to screen target regions no longer than 
about 200-300 bases. The reliability of SSCP for 
detecting single-base mutations is somewhat uncertain 
but is probably in the 70-90% range for targets less 

3 5 than 200 bases. As the size of the target region 

increases, the detection rate declines, for example in 
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one study from 87% for 183 bp targets to 57% for 
targets 307 bp in length (35) . The ability to screen 
longer regions in a single step would enhance the 
utility of any mutation screening method. 
5 Another type of screening technique 

currently in use is based on cleavage of unpaired 
bases in heteroduplexes formed between wild type 
probes hybridized to experimental targets containing 
point mutations. The cleavage products are also 

10 analyzed by gel electrophoresis, as subfragments 
generated by cleavage of the probe at a mismatch 
generally differ significantly in size from full 
length, uncleaved probe and are easily detected with a 
standard gel system. Mismatch cleavage has been 

15 effected either chemically (osmium tetroxide, 
hydroxylamine) or with a less toxic, enzymatic 
alternative, using RNase A. The RNase A cleavage 
assay has also been used, although much less 
frequently, to screen for mutations in endogenous mRNA 

20 targets for detecting mutations in DNA targets 

amplified by PCR. A mutation detection rate of over 
50% was reported for the original RNase screening 
method (36) . 

A newer method to detect mutations in DNA 
25 relies on DNA ligase which covalently joins two 

adjacent oligonucleotides which are hybridized on a 
complementary target nucleic acid. The mismatch must 
occur at the site of ligation. As with other methods 
that rely on oligonucleotides, salt concentration and 
30 temperature at hybridization are crucial. Another 

consideration is the amount of enzyme added relative 
to the DNA concentration. 

The methods mentioned above cannot reliably 
detect a base change in a nucleic acid which is 
35 contaminated with more than 80% of a background 

nucleic acid, such as normal or wild type sequences. 
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Contamination problems are significant in cancer 
detection wherein a malignant cell, in circulation for 
example, is present in extremely low amounts. The 
methods now in use lack adequate sensitivity to be 
practically applied in the clinical setting. 

A method for the detection of gene mutations 
with mismatch repair enzymes has been described by Lu- 
Chang and Hsu. See WO 93/20233. The product of the 
MutY gene which recognizes mispaired A/G residues is 
employed in conjunction with another enzyme described 
in the reference as an "all type enzyme" which can 
nick at all base pair mismatches. The enzyme does not 
detect insertions and deletions. Also, the all type 
enzyme recognizes different mismatches with differing 
efficiencies and its activity can be adversely 
affected by flanking DNA sequences. This method 
therefore relies on a cocktail of mismatch repair 
enzymes and DNA glycosylases to detect the variety of 
mutations that can occur in a given DNA molecule. 

Often, in the clinical setting, the nature 
of the mutation or mismatch is unknown so that the use 
of specific DNA glycosylases is precluded. Thus, 
there is a need for a single enzyme system that is 
capable of recognizing all mismatches with equal 
efficiency and also detecting insertions and 
deletions, regardless of the flanking DNA sequences. 
It would be beneficial to have a sensitive and 
accurate assay for detecting single base pair 
mismatches which does not require a large amount of 
sample, does not require the use of toxic chemicals, 
is neither labor intensive nor expensive and is 
capable of detecting not only mismatches but deletions 
and insertions of DNA as well. 

Such a system, coupled with a method that 
would facilitate the identification of the location of 
the mutation in a given DNA molecule would be clearly 
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advantageous for genetic screening applications. It 
is the purpose of the present invention to provide 
this novel mutation detection system. 

SUMMARY OF THE INVENTION 

The present invention provides materials and 
methods for the detection of mutations or mismatches 
in a targeted polynucleotide strand. Detection is 
achieved using novel endonucleases in combination with 
a gel assay system that facilitates the screening and 
identification of altered base pairing in targeted 
nucleic acid strands. 

According to one aspect of the invention, 
there is provided a novel plant -based nuclease which 
is useful in the detection of mutations or mismatches 
in target DNA or RNA. Celery, for example, (Apium 
graveolens var. dulce) contains abundant amounts of 
the nuclease of the invention which is highly specific 
for insert ional/deletional DNA loop lesions and 
mismatches. This enzyme, designated herein as CEL I, 
0 incises at the phosphodiester bond at the 3' side of 
the mismatched nucleotide. CEL I has been purified 
about 10,000 fold, so as to be substantially 

homogeneous . 

In a preferred embodiment of th invention, 

5 a method is provided for determining a mutation in a 
target sequence of single stranded mammalian 
polynucleotide with reference to a non -mutated 
sequence of a polynucleotide that is hybridizable with 
the polynucleotide including the target sequence. The 

0 sequences are amplified by polymerase chain reaction 

(PCR), labeled with a detectable marker, hybridized to 
one another, exposed to CEL I of the present 
invention, and analyzed on gels for the presence of 

the mutation. 
5 The plant-based endonuclease of the 



i 



WO 97/46701 



PCT/US97/08705 



invention has a unique combination of properties. 
These include the ability to detect all possible 
mismatches between the hybridized sequences formed in 
performing the method of the invention; recognize 
5 polynucleotide loops and insertions between such 

hybridized sequences; detect polymorphisms between 
such hybridized strands; recognize sequence 
differences in polynucleotide strands between about 
10 0 bp and 3 kb in length and recognize such mutations 

10 in a target polynucleotide sequence without 

substantial adverse effects of flanking DNA sequences. 

The plant -based endonuclease, CEL I of the 
invention is not unique to celery. Functionally 
similar enzymatic activities have been demonstrated in 

15 fourteen different plant species. Therefore, the 

enzyme is likely to be conserved in the plant kingdom 
and may be purified from plants other than celery. 
The procedure to purify this endonuclease activity 
from a plant other than celery is well known to those 

20 skilled in the art and is contemplated to be within 

the scope of the present invention. Such enzymes have 
been purified to substantial homogeneity from the 
plant species Arabidopsis thaliana, for example, in 
accordance with the present invention. This novel 

25 enzyme, designated ARA I, is like CEL I in its 

enzymatic activities and thus may be used to advantage 
in the genetic mutation screening assays of the 
invention. 

The plant-based endonuclease may not be limited 
30 to the plant kingdom but may be found in other life 
forms as well. Such enzymes may serve functions 
similar to that of CEL I in celery or be adapted for 
other special steps of DNA metabolism. Such enzymes 
or the genes encoding them may be used or modified to 
35 produce enzymatic activities that can function the 

same or similar to CEL I. The isolation of such genes 
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and their modification is also within the scope of the 
present invention. 

In another embodiment of the invention, the 
above -described method is employed in conjunction with 
5 the addition of DNA ligase, DNA polymerase or a 

combination thereof thereby reducing non-specific DNA 
cleavage. 

In yet another embodiment of the invention, 
the simultaneous analysis of multiple samples is 
10 performed using the above -described enzyme and method 
of the invention by a technique referred to herein as 
multiplex analysis. 

In order to more clearly set forth the 
parameters of the present invention, the following 
15 definitions are provided: 

The term "endonuclease" refers to an enzyme 
that can cleave DNA internally. 

The term "isolated nucleic acid" refers to a 
DNA or RNA molecule that is separated from sequences 

2 0 with which it is normally immediately contiguous (in 

the 5' and 3' directions) in the naturally occurring 
genome of the organism in which it originates. 

The term "base pair mismatch" indicates a 
base pair combination that generally does not form in 
25 nucleic acids according to Watson and Crick base 

pairing rules. For example, when dealing with the 
bases commonly found in DNA, namely adenine, guanine, 
cytosine and thymidine, base pair mismatches are those 
base combinations other than the A-T and G-C pairs 

3 0 normally found in DNA. As described herein, a 

mismatch may be indicated, for example as C/C meaning 
that a cytosine residue is found opposite another 
cytosine, as opposed to the proper pairing partner, 
guanine . 

35 The phrase "DNA insertion or deletion" 

refers to the presence or absence of "matched" bases 
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between two strands of DNA such that complementarity 
is not maintained over the region of inserted or 
deleted bases. 

The term " complement ary" refers to two DNA 
5 strands that exhibit substantial normal base pairing 

characteristics. Complementary DNA may contain one or 
more mismatches, however. 

The term "hybridization" refers to the 
hydrogen bonding that occurs between two complementary 
10 DNA strands. 

The phrase "flanking nucleic acid sequences" 
refers to those contiguous nucleic acid sequences that 
are 5' and 3' to the endonuclease cleavage site. 

The term "multiplex analysis" refers to the 
15 simultaneous assay of pooled DNA samples according to 
the above described methods. 

The term "substantially pure" refers to a 
preparation comprising at least 50-60% by weight of 
the material of interest. More preferably, the 
2 0 preparation comprises at least 75% by weight, and most 
preferably 90-99% by weight of the material of 
interest. Purity is measured by methods appropriate 
for the material being purified, which in the case of 
protein includes chromatographic methods, agarose or 
25 polyacrylamide gel electrophoresis, HPLC analysis and 
the like. 

C>T indicates the substitution of a cytosine 
residue for a thymidine residue giving rise to a 
mismatch. Inappropriate substitution of any base for 

30 another giving rise to a mismatch or a polymorphism 
may be indicated this way. 

N, N, N' , N' -tetramethyl-6 -carboxyrhodamine 
(TAMRA) is a fluorescent dye used to label DNA 
molecular weight standards which are in turn utilized 

35 as an internal standard for DNA analyzed by automated 
DNA sequencing. 
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Primers may be labeled f luorescently with 6- 
carboxyf luorescein (6-FAM) . Alternatively primers may 
be labeled with 4, 7, 2', 7' -Tetrachloro-6- 
carboxyfluorescein (TET) . Other alternative DNA 
labeling methods are known in the art and are 
contemplated to be within the scope of the invention. 

CEL I has been purified so as to be 
substantially homogeneous, thus, peptide sequencing of 
the amino terminus is envisioned to provide the 
corresponding specific oligonucleotide probes to 
facilitate cloning of the enzyme from celery. 
Following cloning and sequencing of the gene, it may 
be expressed in any number of recombinant DNA systems. 
This procedure is well known to those skilled in the 
art and is contemplated to be within the scope of the 
present invention. 

BRIEF DESCRIPTION OF THE DRAWINGS 

Figure 1 shows the results of sodium dodecyl 
sulfate (SDS) polyacrylamide gel analysis of the 
purified enzyme, CEL I. The positions of molecular 
weight markers are shown on the side. T indicates the 
top of the resolving gel . 

Figure 2 depicts certain heteroduplex DNA 
substrates used in performing nucleic acid analyses in 
accordance with the present invention. Figure 2A 
depicts a 64-mer which can be terminally labeled at 
either the 5'-P or the 3' -OH. The nucleotide 
positions used as a reference in t:,is analysis are 
indicated irrespective of the number of nucleotide 
insertions at X in the top strand. The inserted 
sequences and substrate numbers are indicated in the 
table. Figure 2B illustrates mismatched basepair 
substrates used in this analysis, with the identities 
of nucleotides Y and Z varied as in the accompanying 
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table to produce various mispaired substrates. 

Figure 3 is an autoradiogram demonstrating 
the effect of temperature on CEL I incisions in 
different substrates . 

Figure 4 is an autoradiogram illustrating 
the relative incision preferences of CEL I at DNA 
loops of one nucleotide. Figure 4A shows that in 
addition to the X=G, the X=C also allows two alternate 
basepairing conformations. Figure 4B demonstrates 
that the bottom strand of the substrate is competent 
for CEL I incision as in the C/C mismatch, #10, in 
lane 16. 

Figure 5 is an autoradiogram of denaturing 
15% polyacrylamide gels showing the AmpliTaq DNA 
polymerase mediated stimulation of purified CEL I 
incision at DNA mismatches of a single extrahelical 
nucleotide. F indicates the full length substrate, 64 
nucleotides long, labeled at the 5' terminus (*) of 
the top strand. In panels 5A, 5B and 5C, substrates 
were treated with varying quantities of CEL I in the 
presence or absence of DNA polymerase . 

Figure 6 is an autoradiogram showing the pH 
optimum of CEL I incision at the extrahelical G 
residue in the presence or absence of AmpliTaq DNA 
polymerase. The top panel shows the CEL I activity in 
the absence of AmpliTaq DNA polymerase. The bottom 
panel shows CEL I activity in the presence of 
polymerase. 

Figure 7 is an autoradiogram demonstrating 
the recognition of base substitution mismatches by 
purified CEL I in the presence of AmpliTaq DNA 
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polymerase. (I) indicates the primary incision site 
at the phosphodiester bond 3 ' of a mismatched 
nucleotide. Panel 7A illustrates cleavage of the 
substrate in the presence of both CEL I and DNA 
5 polymerase. In panel 7B, CEL I was omitted. 

Figure 8 is an autoradiogram illustrating 
the ability of CEL I to recognize mutations in pooled 
DNA samples in the presence of excess wild-type DNA. 
Lanes 3, 5, 6, 10, 11, 12, and 13 contain single 

10 samples containing wild type heteroduplexes . Lanes 4 
and 6 contain an AG deletion. Lanes 8 and 9 contain a 
substrate with an 11 base-pair loop. The samples 
described above were pooled and treated with CEL I. 
The results of this "multiplex analysis" are shown in 

15 Lane 14. 

Figure 9 is an autoradiogram further 
illustrating the ability of CEL I to recognize 
mutations in the presence of excess wild-type DNA. 1, 
2, 3, 4, 10 or 30 heteroduplexed, radiolabeled PCR 

20 products (amplified from exon 2 of the BRCA1 gene) 
were exposed to CEL I in a single reaction tube and 
the products run on a 6% polyacrylamide gel. Lanes 1 
and 2 are negative controls run in the absence of CEL 
I. Lane 3 to 11 contain 1 sample with the AG deletion 

25 in the presence of increasing amounts of wild- type 
non-mutated heteroduplexes. 

Figure 10 shows a schematic representative 
diagram of the BRCA1 gene and the exon boundaries in 
the gene. 

30 

Figure 11 is a histogram of a sample showing 
the localization of a 5 base deletion in the 11D exon 
of BRCA1 following PCR amplification and treatment 
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with CEL I. A spike indicates a DNA fragment of a 
specific size generated by cleavage by CEL I at the 
site of a mismatch. Panel A shows the results 
obtained with a 6-FAM labeled primer annealed at 
5 nucleotide 3177 of BRCA1. Panel B shows the results 
obtained with a TET labeled primer annealed 73 bases 
into the intron between exon 11 and exon 12. Panel C 
represents the TAMRA internal lane size standard. 
Note that the position of the mutation can be assessed 
10 on both strands of DNA. 

Figure 12 is a histogram of a sample showing 
the localization of nonsense mutation, A>T, at 
position 2154 and a polymorphism C>T at nucleotide 
2201 in the lie exon of BRCA1 following PGR 
15 amplification and treatment with CEL I. Panel A shows 
a spike at base #700 and Panel B shows a spike at #3 05 
corresponding to the site of the nonsense mutation. 
Panel C is the TAMRA internal lane standard. 

Figure 13 shows the results obtained from 
20 four different samples analyzed for the presence of 

mutations in exon 11A using the methods of the instant 
invention. Results from the 6-FAM samples are shown. 
Panel A shows a polymorphism T>C at nucleotide 243 0 
and a second spike at position #483 corresponding to 
25 the site of another polymorphism C>T at nucleotide 
2731. Panel B shows only the second polymorphism 
described in panel A. Panel C shows no polymorphism 
or mutation. Panel D shows the two polymorphisms seen 
in panel A. 

30 Figure 14 depicts a gel showing the 

purification scheme for ARA I mismatch endonuclease of 
Arabidopsis thaliana. Lane 1: Crude extract of cells 
broken by French Press; Lane 2: 25% - 85% saturated 
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ammonium sulfate fractionation; Lane 3: Con A- 
Sepharose affinity column, ARA I was eluted by a- 
methyl mannoside; Lane 4: Phosphocellulose P-li 
column ARA I peak; Lane 5: DEAE Sephacel anion 
exchange column ARA I peak. The molecular weight 
standards are shown in lanes indicated with "S". 

Figure 15 shows an autoradiogram of a denaturing 
DNA sequencing gel analysis demonstrating that ARA I 
cuts mismatched substrates throughout the purification 
scheme. Lane numbers correspond to those of the 
purification steps in Figure 14. Panels A, B, C 
illustrate the ARA I cutting of substrate #2, 
substrate #4 and substrate #18 (no-mismatch control 
substrate) , respectively. F = full length, I = ARA I 
cut . 

Figure 16 is a schematic diagram of the ARA 
I based mismatch detection assay. 

Figure 17 is an illustration of data 
obtained from GeneScan analysis of endonucleolytic 
activity of ARA I on a heteroduplex containing a 
mismatch. 

Figure 18 shows a comparison of the GeneScan 
mutation detection of ARA I versus CEL I, involving a 
series of control reactions using the wild type allele 
of exon 19 of the BRCA1 gene. This fragment of DNA 
does not contain any mutations and accordingly no 
mismatch nicking was observed. Panels A and B show 
the two strands treated with 7 ng of CEL I and 
stimulated in mismatch cutting by Amplitaq DNA 
polymerase, B = (6-FAM) ; G = (TET) . Panels C and D 
are the two strands treated with 20 ng of purified ARA 
I without stimulation by Amplitaq DNA polymerase. 
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Panels E and F show the two strands treated with 20 ng 
of ARA I stimulated for mismatch cutting by the 
presence of Amplitaq DNA polymerase. 

Figure 19 depicts a side -by- side GeneScan 
5 analysis of CEL I and ARA I mismatch detection 

activity in exon 19 of the BRCA1 gene. Panels A and B 
show mismatch cutting using 7 ng of CEL I in the 
presence of 0.5 units of Amplitaq DNA polymerase. 
Panels C and D show the cutting of an A nucleotide 

10 deletion mismatch by 20 ng of ARA I without Amplitaq 
DNA polymerase. Panels E and F show the cutting of 
the same substrate by 2 ng of ARA I stimulated in 
mismatch cutting by the presence of 0.5 ng units of 
Amplitaq DNA polymerase. All mutations and 

15 polymorphisms detected were confirmed by automated 
sequencing. These results suggest that ARA I, like 
the CEL I mutation detection method, can identify 
mutations that are difficult to detect with SSCP or 
DNA sequencing. 

20 Figure 20 shows a comparison of the GeneScan 

mutation detection of ARA I versus CEL I, involving a 
series of control reactions employing the wild type 
allele of exon 2 of the BRCA1 gene. As in figure 18, 
this gene segment does not contain any mutations, thus 

25 no mismatch nicking is observed. Panel A and B show 
the two strands treated with 7 ng of CEL I stimulated 
in mismatch cutting by 0 . 5 units of Amplitaq DNA 
polymerase. Panels C and D show the two strands 
treated with 20 ng of purified ARA I without 

30 stimulation by Amplitaq DNA polymerase. Panels E and 
F show the two strands treated with 20 ng of ARA I 
stimulated for mismatch cutting by the presence of 
Amplitaq DNA polymerase. Panels G and H show the two 
strands treated with 2 ng of ARA I stimulated for 
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mismatch cutting by the presence of 0.5 units of 
Amplitaq DNA polymerase. 

Figure 21 depicts GeneScan analysis of CEL I and 
ARA I mismatch detection in Exon 2 of the BRCA1 gene. 
5 Panels A and B show an AG-deletion mismatch cutting by 
7 ng of CEL I in the presence of 0.5 units of Amplitaq 
DNA polymerase. Panels C and D show the cutting of 
the AG nucleotide deletion mismatch by 20 ng of ARA I 
without Amplitaq DNA polymerase. Panels E and F show 

10 the cutting of the same substrate by 20 ng of ARA I 

stimulated in mismatch cutting by the presence of 0.5 
ng units of Amplitaq DNA polymerase. Panels G and H 
show the cutting of the same substrate by 2 ng of ARA 
I stimulated in mismatch cutting by the presence of 

15 0.5 units of Amplitaq DNA polymerase. 

Figure 22 is an autoradiogram demonstrating 
that mismatch endonuclease activity similar to that of 
ARA I and CEL I is present in the extracts of 10 other 
plants . 

20 Figure 23 is an autoradiogram showing that 

mismatch endonuclease activity similar to ARA I and 
CEL I is present in the extracts of 11 other plants. 

DETAILED DESCRIPTION OP THE INVENTION 

25 The enzymatic basis for the maintenance of 

correct base sequences during DNA replication has been 
extensively studied in E. coli. This organism has 
evolved a mismatch repair pathway that corrects a 
variety of DNA basepair mismatches in hemimethylated 

3 0 DNA as well as insertions/deletions up to four 

nucleotides long. Cells deficient in this pathway 
mutate more frequently, hence the genes are called 
MutS, MutL and MutH etc. MutS protein binds to the 
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mismatch and MutH is the endonuclease that incises the 
DNA at a GATC site on the strand in which the A 
residue is not methylated. MutL forms a complex with 
MutH and MutS during repair. Homologs of MutS and 
5 MutL, but not MutH exist in many systems. In yeast 

MSH2 (MutS homolog) can bind to a mismatch by itself, 
but a complex of two MutL homologs (MLH and PMS1) plus 
a MSH2 has been observed. The human homolog hMSH2 has 
evolved to bind to larger DNA insertions up to 14 

10 nucleotides in length, which frequently arise by 

mechanisms such as misalignment at the microsatelite 
repeats in humans. A role for hMLHl in loop repair is 
unclear. Mutations in any one of these human homologs 
were shown to be responsible for the hereditary form 

15 of non-polyposis colon cancer (27, 28) . 

Celery contains over 4 0 /ig of psoralen, a 
photoreactive intercalator , per gram of tissue (3) . 
As a necessity, celery may possess a high capability 
for the repair of lesions of insertion, deletion, and 

20 other psoralen photoadducts . Single -strandedness at 
the site of the lesion is common to base substitution 
and DNA loop lesions. The data in the following 
examples demonstrate that celery, Arabidopsis thaliana 
and other plant species possess ample mismatch- 

25 specific endonucleases to deal with these potentially 
mutagenic events. 

It has been found that the incision at a mismatch 
site by CEL I is greatly stimulated by the presence of 
a DNA polymerase. For a DNA loop containing a single 

30 nucleotide insertion, CEL I substrate preference is 
A > G > T > C. For base-substitution mismatched 
basepairs, CEL I preference is C/C > C/A - C/T > G/G > 
A/C ~ A/A ~ T/C > T/G ~ G/T - G/A ~ A/G > T/T. CEL I 
shows a broad pH optimum from pH 6 to pH 9. To a 

3 5 lesser extent compared with loop incisions, CEL I is 
also a single-stranded DNase, and a weak exonuclease. 
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CEL I possesses novel biochemical activities when 
compared to other nucleases. Mung Bean Nuclease is a 
3 9 kd nuclease that is a single-stranded DNase and 
RNase, and has the ability to nick DNA at destabilized 
regions and DNA loops (19-22) . However, it has a pH 
optimum at 5.0. It is not known whether Mung Bean 
Nuclease activity can be stimulated by a DNA 
polymerase as in the case of CEL I . Thus CEL I and 
Mung Bean Nuclease appear to be different enzymes; 
however this has not yet been conclusively confirmed. 

The mechanism responsible for the AmpliTaq 
DNA polymerase stimulation of the CEL I activity is 
presently unknown. One possibility is that the DNA 
polymerase has a high affinity for the 3' -OH group 
produced by the CEL I incision at the mismatch and 
displaces CEL I simply by competition for the site. 
CEL I may have different affinities for the 3 '-OH 
termini generated by incisions at different 
mismatches, thereby attenuating the extent that 
AmpliTaq DNA polymerase can stimulate its activity. 
The use of a DNA polymerase to displace a repair 
endonuclease in DNA repair was also observed for the 
UvrABC endonuclease mechanism (25) . It was shown that 
the UvrABC endonuclease does not turnover unless it is 
in the presence of DNA polymerase I . The protein 
factors in vivo that can stimulate the CEL I activity 
may not be limited to DNA polymerases. It is possible 
that DNA helicases, DNA ligases, 3' -5' exonucleases or 
proteins that bind to DNA termini may perform that 
function. 

It is important to note that a 5' -labeled 
substrate can be used to show a CEL I incision band in 
a denaturing polyacrylamide gel. Recently, a putative 
human all -type mismatch incision activity (24) was 
shown to be related to the human topoisomerase I. 
This enzyme is unable to release itself from a 5'- 
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labeled substrate after mismatch nicking due to the 
formation of a covalent enzyme -DNA intermediate with 
the 3' terminus of the DNA nick (26) . This covalent 
protein-DNA complex cannot migrate into the denaturing 
5 polyacrylamide gel to form a band. CEL I mismatch 
nicking has been demonstrated with 5' labeled 
substrates. Therefore, CEL I is not a plant 
equivalent of the topoisomerase I -like human all -type 
mismatch repair activity. 

10 CEL I appears to be a mannopyranosyl 

glycoprotein as judged by its tight binding to 
Concanavalin A-Sepharose resin and by the staining of 
CEL I with the Periodic acid-Schiff glycoprotein 
stain. Insofar as is known, no repair enzyme has been 

15 demonstrated to be a glycoprotein. Glycoproteins are 
often found to be excreted from the cell, on cellular 
membranes or secreted into organelles. However, 
glycoproteins have also been shown to exist in the 
nucleus for important functions. The level of a 100 

20 kDa stress glycoprotein was found to increase in the 
nucleus when Gerbil fibroma cells are subjected to 
heat shock treatment (27) . Transcription factors for 
RNA polymerase II in human cells are known to be 
modified with N-acetylglucosamine residues (28, 29) . 

25 Recently, lactoferrin, an iron-binding glycoprotein, 

was found to bind to DNA in the nucleus of human cells 
and it activated transcription in a sequence-specific 
manner (30) . The nuclei of cells infected with some 
viruses are known to contain viral glycoproteins (31- 

3 0 33) . These examples where glycoproteins are known to 
exist inside the nucleus, not merely on the nuclear 
membrane or at the nuclear pores, tend to show that 
glycosylated proteins may be important in the nucleus. 
CEL I appears to be an example of a glycoprotein that 

35 can participate in DNA repair. 

The properties of the celery mismatch 
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endonuclease CEL I resemble those of single -stranded 
nucleases. The best-suited substrates for CEL I are 
DNA loops and base-substitution mismatches such as the 
C/C mismatch. In contrast, loops greater than 4 
nucleotides and the C/C mismatch are the substrates 
worst-suited for the E. coli mutHLS mismatch repair 
system (1,2). Thus CEL I is an enzyme that possesses 
novel mismatch endonuclease activity. 

The following examples are provided to 
describe the invention in further detail. These 
examples, which set forth the best mode presently 
contemplated for carrying out the invention, are 
intended to illustrate and not to limit the invention. 

Example I 
Purification of CEL I 

Two different CEL I preparations were made 
up as described below. Their properties are similar 
except that the less pure preparation (Mono Q 
fraction) may contain protein factors that can 
stimulate the CEL I activity. 

(i) Preparation of CEL I Mono 0 fraction 

100 gm of celery stalk was homogenized in a 
Waring blender with 100 ml of a buffer of 0 . 1 M 
Tris-HCl pH 7.0 with 10 /xM phenylmethanesulf onyl 
fluoride (PMSP) (Buffer A) at 4 °C for 2 minutes. The 
mixture was cleared by centrif ugation, and the 
supernatant was stored at -70°C. The extract was 
fractionated by anion exchange chromatography on a 
FPLC Mono Q HR5/10 column. The bound CEL I nuclease 
activity was eluted with a linear gradient of salt at 
about 0.15 M KC1 . 
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(ii) Preparation of highly purified CEL I 

7 Kg of celery at 4 °C was extracted with a 
juicer and adjusted with 10X Buffer A to give a final 
concentration of IX Buffer A. The extract was 
5 concentrated with a 25% to 85% saturation ammonium 
sulfate precipitation step. The final pellet was 
dissolved in 250 ml of Buffer A and dialyzed against 
0.5 M KCl in Buffer A. The solution was incubated 
with 10 ml of Concanavalin A-Sepharose resin (Sigma) 

10 overnight at 4 °C. The slurry was packed into a 2.5 

cm diameter column and washed with 0.5 M KCl in Buffer 
A. The bound CEL I was eluted with 6 0 ml of 0.3 M ot-D 
mannose, 0.5 M KCl in Buffer A at 65 °C. The CEL I 
was dialyzed against a solution of 25 mM KP0 4/ 10 /*M 

15 PMSF, pH 7.4 (Buffer B) , and applied to a 

phosphocellulose column that had been equilibrated in 
the Buffer B. The bound enzyme was eluted with a 
linear gradient of KCl in Buffer B. The peak of CEL I 
activity from this column was further fractionated by 

20 size on a Superose 12 FPLC column in 0.2 M KCl, 1 mM 
ZnCl 2 , 10 (M PMSF, 50 mM Tris-HCl pH 7.8. The center 
of the CEL I peak from this gel filtration step was 
used as the purified CEL I in this study. A protein 
band of about 34,000 daltons is visible when 5 

25 micrograms of CEL I of the Superose 12 fraction was 
visualized with Coomassie Blue staining or 
carbohydrate staining (Periodic acid-Schiff base 
mediated staining kit, SIGMA Chemicals (5)) on a 15% 
polyacrylamide SDS PAGE gel as shown in Figure 1. A 

30 second band of approximately 36,000 daltons was also 
visible in the gel. Both bands were stained with the 
glycoprotein specific stain. The subtle mobility 
differences observed in the two bands may be due to 
differential glycosylation. Alternatively, there may 

35 be a contaminant in the preparation which co-purifies 
with CEL I. 
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Protein determination 

Protein concentrations of the samples were 
determined by the Bicinchoninic acid protein assay (4, 
Pierce) . 

Following purification of CEL I enzyme, 
mutational analysis on experimental and clinical DNA 
substrates were performed in a suitable gel system. 
CEL I recognized and cleaved DNA at a variety of 
mismatches, deletions and insertions. The following 
examples describe in greater detail the manner in 
which mutational analysis is practiced according to 
this invention. 

EXAMPLE II 
Preparation of heteroduplexes 
containing various mismatches 

DNA heteroduplex substrates of 64 basepairs 
long were constructed containing mismatched basepairs 
or DNA loops which were prepared using similar methods 
reported in Jones and Yeung (34) . The DNA loops are 
composed of different nucleotides and various loop 
sizes as illustrated in Fig. 2. The DNA duplexes were 
labeled at one of the four termini so that DNA 
endonuclease incisions at the mispaired nucleotides 
could be identified as a truncated DNA band on a 
denaturing DNA sequencing gel. The oligonucleotides 
were synthesized on an Applied Biosystems DNA 
synthesizer and purified by using a denaturing PAGE 
gel in the presence of 7M urea at 50 °C. The purified 
single -stranded oligonucleotides were hybridized with 
appropriate opposite strands. The DNA duplex, 
containing mismatches or not, was purified by using a 
nondenaturing PAGE gel. DNA was eluted from the gel 
slice by using electro-elution in a Centricon unit in 
an AMICON model 57005 electroeluter . The upper 
reservoir of this unit has been redesigned to include 
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water-tight partitions that prevent cross- 
cont aminat ion . 

EXAMPLE III 
Mismatch endonuclease assay 

5 Fifty to 100 fmol of 5' [ 32 P] -labeled 

substrate described in Example II were incubated with 
the Mono Q CEL I preparation in 20 mM Tris-HCl pH 7.4, 
25 mM KC1, 10 mM MgCl 2 for 30 minutes at temperatures 
of 0 °C to 80 °C. From one half to 2,5 units of 

10 AmpliTaq DNA polymerase was added to the nuclease 
assay reaction. Ten fxM dNTP was included in the 
reaction mixture where indicated (Figures 2 & 5) . The 
20 /iL reaction was terminated by adding 10 /xL of 1.5 % 
SDS, 4 7 mM EDTA, and 75% formamide plus tracking dyes 

15 and analyzed on a denaturing 15% PAGE gel in 7M urea 
at 50 °C. An autoradiogram was used to visualize the 
radioactive bands. Chemical DNA sequencing ladders 
were included as size markers. Incision sites were 
accurately determined by co-electrophoresis of the 

2 0 incision band and the DNA sequencing ladder in the 
same lane . 

Example IV 

The Effect of Temperature on CEL I Incision 
Activity at single-nucleotide DNA 
25 loop and nucleotide substitutions 

The CEL I fraction eluted from the Mono Q 
chromatography of the celery extract was found to 
specifically nick DNA heteroduplexes containing DNA 
loops with a single extrahelical guanine (substrate 
30 #2) or thymine residue (#3) , but not the perfectly 

basepaired DNA duplex #1 as shown in Fig. 3. In these 
experiments fifty fmol of heteroduplex #2 (lanes 3-9) , 
#3 (lanes 10-16) , perfectly basepaired duplex #1 
(lanes 17-23) and single-stranded DNA substrate (lanes 
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24-30), each labeled at the 5' -terminus with 7- [ 32 p] 
ATP and T4 polynucleotide kinase at about 6000 
Ci/mmol, were incubated with 0.5 \iL (10 pig) of the 
Mono Q fraction of the CEL I preparation in 20 mM 
Tris-HCl pH 7.4, 25 mM KC1, 10 mM MgCl 2 for 30 minutes 
at various temperatures. Each 20 jxL reaction was 
terminated by adding 10 /xL of 1.5% SDS, 47 mM EDTA, 
and 75% formamide containing xylene cyanol and 
bromophenol blue . Ten yih of the sample was loaded 
onto a 15% polyacrylamide, 7 M urea denaturing DNA 
sequencing gel at about 50 °C, and subjected to 
electrophoretic separation and autoradiography as 
previously reported (7) . The G+A and the T chemical 
sequencing reactions were performed as described (7) 
and used as size markers. CEL I incision produced 
bands at about 35 nucleotides long. Lines are drawn 
from the positions of the incision bands to the 
phosphodiester bonds (I and II) nicked by the 
endonuclease in the reference sequencing ladder. For 
a 5' -labeled substrate, when a nuclease nicks 5' of a 
nucleotide and produces a 3' -OH terminus, the 
truncated band runs half a nucleotide spacing slower 
than the band for that nucleotide in the chemical DNA 
sequencing reaction product lane (34) . 

Substrate #2 can basepair in two 
conformations because the inserted G is within a CGCG 
sequence. Therefore either the G residue in the 
second or the third nucleotide position can become 
unpaired, possibly extrahelical in conformation, when 
this duplex is hybridized: 

5'-CGGCG-3' or 5'-CGGCG-3' 
3' -G-CGC-5' 5' -GC-GC-5' 

Accordingly, two mismatch incision bands were 
observed, each correlating to the phosphodiester bond 
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immediately 3' of the unpaired nucleotide. See Fig. 
3, lanes 3-9, This slippage can occur in the target 
sequence only when G or C is in the mismatched top 
strand. Therefore, the non-paired T residue in 
5 substrate #3 gave one incision band at the same 

relative position as the upper band derived from the 
substrate #2. See Fig. 3, lanes 10-16. These gel 
mobilities are consistent with the production of a 3 ' - 
OH group on the deoxyribose moiety (6) . CEL I 
10 increases in activity with temperature up to 45°C as 
illustrated by the increase in band intensity, see 
Fig. 3. However, from 65°C to 80°C, specificity is 
diminished due to DNA duplex denaturation. 

15 EXAMPLE V 

Relative Incision Preferences of CEL I 

To ascertain whether there is a single 
endonuclease incision at each DNA duplex, the 
experiment described in Fig. 3 was repeated with DNA 

20 labeled on the 3' terminus of the top strand. If 

there were only one incision site, initial incision 
positions revealed by substrates labeled at the 5' or 
the 3' termini should be at the same phosphodiester 
bond. In these experiments, substrates were labeled 

25 at the 3' termini with [ 32 P] a-dCTP, cold dGTP and the 
Klenow fragment of DNA polymerase I to about 6000 
Ci/mmol. The sample preparation, denaturing gel 
resolution and autoradiogram analysis are the same as 
described in Fig. 3 except incubation of 50 fmole of 

30 substrate with 10 /ig of the CEL I Mono Q fraction was 
for 3 0 minutes at a single temperature, 3 7°C. The DNA 
sequencing ladders for substrates #4 and #5 are shown 
in lanes 1-4 to illustrate the DNA sequences used. 
Lanes 5-8 had no enzyme during the incubation. Lanes 

35 9-12 are mismatch endonuclease incisions of the 

substrates #2, #4, #5, #3, respectively. A line is 
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drawn from the position of the incision band to the 
phosphodiester bond (I) nicked by the endonuclease in 
the reference sequencing ladder. Lanes 13 and 14 
demonstrate the coelectrophoresis of the CEL I 
incision band with a chemical DNA sequencing ladder to 
accurately determine the incision position. 
Relative incision preferences for substrates #2, #3, 
#4, and #5 are shown in Fig. 4 for the 3' labeled 
substrates. The mobilities of the incision bands in 
lanes 9-12 of Fig. 4 indicate that the incision 
reactions had occurred at the phosphodiester bond 
immediately 3' of the unpaired nucleotide. Therefore, 
the incision site is the same for substrates labeled 
either at the 5' or the 3' terminus. The fact that 
the DNA incision was found to occur at the same bond 
position, whether the substrate DNA was labeled at the 
5' termini or the 3' termini shows that CEL I is not a 
DNA glycosylase. A DNA glycosylase mechanism would 
cause the DNA incision position in the two DNA 
substrates to be one base apart because a base is 
excised by the DNA glycosylase. 

Precise determination of the incision site was 
performed as in the example in lane 14 in which the T 
residue chemical sequencing reaction of the labeled 
top strand of substrate #2 (lane 13) was mixed with 
the CEL I incision product of lane 9 and analyzed in 
the same lane. For a 3' -labeled substrate, when a 
nuclease nicks 3' of a nucleotide and produces a 5' P0 4 
terminus, the truncated band runs with the band for 
that nucleotide in the chemical DNA sequencing 
reaction product lane (7) . Moreover, the gel 
mobility, relative to the size standards of chemical 
DNA sequencing, illustrated that the DNA nick produced 
a 5' -phosphorylated terminus (6). For a DNA loop with 
a single nucleotide insertion, the nuclease 
specificity is A > G> T > C. It can be seen in Fig. 
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4A that a small amount of 5' to 3' exonuclease 
activity is present in this CEL I preparation. 

To test whether CEL I can cut in the bottom 
strand across from a DNA loop of one nucleotide in the 
5 top strand, or whether nicking of the loop -containing 
strand may lead to secondary CEL I incision across 
from the nick, the bottom strand that contains no 
unpaired nucleotides in substrate #2 was labeled at 
the 3' end and incubated in the presence of CEL I. 

10 The extrahelical nucleotide in the top strand, or the 
DNA nick made by CEL I in the top strand of substrate 
#2, seen in lane 9 of Fig. 4, did not lead to 
significant nicking of the bottom strand (lane 18) . 
As a control against the possibility that DNA sequence 

15 effect may favor CEL I incision in the top strand and 
not the bottom strand, CEL I was tested for incision 
of the bottom strand in the C/C mismatch substrate in 
lanes 15 and 16. Mismatch incision was made when CEL 
I was present in lane 16 . 

20 In the characterization of the incision site 

of a repair endonuclease , it is important to determine 
whether one or two incisions have been made for each 
lesion. This is normally accomplished by using 
lesion-containing substrates that have been labeled, 

25 in turn, at the four termini of a DNA duplex. This 
test has been satisfied in the analysis of substrate 
#2 by using three labeled substrates because of the 
near absence of incision in the bottom strand. In 
Fig. 3, lane 4-7 and Fig. 4, lane 9, respectively, the 

30 incision of this substrate in both the 5' labeled and 
the 3' labeled substrates have been compared. The 
incision site was found to be at the 3' side of the 
mismatched nucleotide in both cases. The lack of 
incision on the bottom strand for substrate #2 was 

35 demonstrated in lane 18 of Fig. 4. Only the 5' 

labeled substrate was needed in this case since no 
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significant bottom strand incision had occurred. 

Example VI 

Effect of AmpliTaq DNA polymerase on the 
incisions at DNA loop mismatches 

CEL I activity is stimulated by the presence 
of a DNA polymerase. In Fig. 5, the CEL I incisions 
at single-nucleotide loop substrates were stimulated 
by AmpliTaq DNA polymerase to different extents 
depending on which nucleotides are present in the 
loop. It was necessary to use different amounts of 
CEL I to illustrate the AmpliTaq DNA polymerase 
stimulation. The stimulation of the incision at 
extrahelical C and extrahelical T substrates are best 
illustrated in Figs. 5 A fit B (compare lanes 4 with 
lanes 9, and lanes 5 with lanes 10, in the respective 
panels) where higher CEL I levels are required to show 
good incision at these mismatches. For extrahelical G 
and extrahelical A substrates that are among the best 
substrates for CEL I, AmpliTaq DNA polymerase 
stimulation can best be illustrated by using a much 
lower level of CEL I as in Fig. 5. The amounts of 
AmpliTaq stimulation of CEL I in Fig. 5 were 
quantified and presented in Table I. 
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Table I 

Quantification of the CEL I incision bands 
shown in the autoradiogram in Fig. 5. 



10 



15 



Substrate 


Panel 
lane# 


Counts 


Panel 
lane# 


Counts 




Extrahelical G,band I 


A, 2 


20894 


A, 7 


22101 


1.1 


Extrahelical A, band I 


A, 3 


19451 


A, 8 


26357 


1.4 


Extrahelical C,band I 


A, 4 


4867 


A, 9 


12009 


2.5 


Extrahelical T,band I 


A, 5 


2297 


A, 10 


25230 


11.0 


Extrahelical G,band I 


B, 2 


12270 


B,7 


19510 


1.6 


Extrahelical A, band I 


B, 3 


10936 


B,8 


24960 


2.3 


Extrahelical Cband I 


B,4 


1180 


B, 9 


2597 


2.2 


Extrahelical T,band I 


B,5 


700 


B, 10 


21086 


30.1 


Extrahelical G,band I 


C,ll 


10409 


C, 13 


18649 


1.8 


Extrahelical G,bandII 


C f 11 


9020 


C,13 


19912 


2.2 


Extrahelical A, band I 


C,12 


7165 


C, 14 


14983 


2.1 



+/- 



The Autoradiograms were quantified in two dimensions with an 
AMBIS densitometer and the amount of signal in each band is given 
as counts. 



20 Example VII 

Optimum pH of CEL I Activity 

The pH optimum of CEL I for the extrahelical 
G substrate was investigated in the absence or 
presence of the AmpliTaq DNA polymerase. CEL I 

25 (9.5 ng) was incubated with 100 fmol of the substrate 
in a 20 ^L reaction in buffers of pH 5-6.5 (imidazole) 
and pH 7-9.5 (Tris-HCl) for 30 minutes at 37 °C. When 
used, one half unit of AmpliTaq DNA polymerase was 
present in the incubation in the top (- polymerase) or 

30 bottom panels (+ polymerase) , respectively. As shown 
in Fig. 6, CEL I was found to be active from pH 5.0 to 
pH 9.5, and showed a broad pH optimum centered about 
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pH 7.5 (top panel) . When AmpliTaq DNA polymerase was 
present, the incision was stimulated across the whole 
pH range (bottom panel) . The assay method did not use 
initial kinetics and thus precluded quantitative 
5 conclusions on this pH profile of CEL I. However, it 
is clear that the enzyme works very well in the 
neutral pH ranges. 

Example VIII 
Incisions bv CEL I at basepair substitutions 

10 Other combinations of mismatched substrates 

are also recognized by CEL I and incised on one of the 
two DNA strands of each DNA duplex. Some of these 
substrates are less efficiently incised compared with 
those containing DNA loops; therefore 4 5°C was used 

15 for incubation instead of 37 °C. Substrates with the 
5' termini of the top strands labeled were used in 
this study. The autoradiogram of Fig. 7 shows that 
mismatches containing a C residue are the preferred 
mismatch substrates with C/C often better than C/A and 

20 C/T. The incisions at these mismatches tend to 

produce two alternate incision positions, one at the 
phosphodiester bond 3' of the mismatched C residue, 
one at the phosphodiester bond one nucleotide further 
removed in the 3' direction. Whether alternate 

25 incision sites will be observed for these mismatches 
within another DNA sequence context has not been 
investigated. One possible explanation for this 
phenomenon may be greater basepair destabilization 
next to a mismatch that contains a C residue than for 

30 other base-substitutions. Alternatively, the specific 
mismatched nucleotide may shift one position to the 3' 
side because the next nucleotide is also a C residue 
and the two residues can exchange their roles in the 
pairing with the G residue in the opposite DNA strand. 

35 For base substitution mismatched basepairs, CEL I 
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specificity in the presence of AmpliTaq DNA 

polymerase, with respect to the top strand, is C/C > 

C/A ~ C/T > G/G > A/C - A/A ~ T/C > T/G - G/T ~ G/A ~ 

A/G > T/T (Fig- 7A) . Because eubacterial DNA 

5 polymerases are known to incise at unusual DNA 

structures (8) , a test was conducted to determine 

whether AmpliTaq DNA polymerase by itself will incise 

at the 13 substrates used in Fig. 7. Under extended 

exposure of the autoradiogram, no mismatch incision by 

10 the AmpliTaq DNA polymerase was observed (Fig. 7B) . 

Example IX 

Detection of DNA mutations 
Using CEL-I and Multiplex Analysis 

The sensitivity of CEL I for mismatch 

15 detection is illustrated by its ability to detect 
mutations in pooled DNA samples. DNA was obtained 
from peripheral blood lymphocytes from individuals 
undergoing genetic screening at the Fox Chase Cancer 
Center. Samples were obtained from breast cancer- 

20 only, ovarian cancer-only, breast /ovarian cancer 

syndrome families or from non-breast/ovarian cancer 
control samples. Unlabeled primers specific for exon 
2 of BRCA1 were utilized to PCR amplify this region of 
the gene. The wild-type PCR products of exon 2 were 

25 labeled with gamma 32 P-ATP. Briefly, 10 picomoles of 
PCR product were purified by the Wizard procedure 
(Promega) . Exon 2 wild- type products were then 
phosphorylated using T4 kinase and 15 picomoles of 
gamma 32 P-ATP at 6,000 Ci/mmol in 30 /zl IX kinase 

30 buffer (70 mM Tris-HCl (pH 7.6), 10 mM MgCl 2 , 5 mM 
dithiothreitol) at 37°C for 1 hour. The reactions 
were stopped with 1 pi 0.5 M EDTA. The reaction 
volume was brought up to 50 /xl with IX STE buffer (100 
mM NaCl, 20 mM Tris-HCl, pH 7.5, 10 mM EDTA) and 

35 processed through a Pharmacia Probe Quant column. 

Labeled DNA (1 pmol//il in 100 /il) was then used for 
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hybridization with individual unlabeled PGR amplified 
experimental samples. For each individual sample, 100 
fmol of the unlabeled PCR amplified product was 
incubated with 200 fmol of the 32 P- labeled wild- type 
5 PCR product in CEL I reaction buffer (25 mM KC1, 10 mM 
MgCl 2 , 20 mM Tris-HCl, pH 7.5). Following denaturation 
and renaturation, heteroduplexed, radiolabeled PCR 
products were exposed to CEL I for 30 minutes at 37°C 
in IX CEL reaction buffer and stopped via the addition 

10 of 10 /il stop mix (75% formamide, 47 mM EDTA, 1.5% 
SDS, xylene cyanol and bromophenol blue) . The 
heteroduplexes were treated with the enzyme 
individually (lanes 4-13) or pooled in one sample tube 
(lane 14) and treated. The products of the reaction 

15 were loaded onto a 15% polyacrylamide gel containing 7 
M urea and the results are shown in Fig. 8. Out of 
the 10 samples analyzed, 2 contained an AG deletion 
(lanes 4 and 7) , 2 contained an 11 base-pair loop 
(lanes 8 and 9) , and the other 6 were wild type (lanes 

20 5, 6, 10, 11, 12, and 13). Cleavage by CEL I at the 
AG deletion resulted in the formation of two bands, 
one of approximately 151 nucleotides from the top 
strand, the other at 112 nucleotides from the bottom 
strand (lanes 4 and 7) . Cleavage by CEL I at 11 base- 

25 pair loops resulted in the formation of one band at 
147 nucleotides from the top strand, and a group of 
bands at 109 nucleotides in the bottom strand (lanes 8 
and 9) . Lanes 1, 2 and 3 contain DNA that was not 
exposed to CEL I as negative controls, lane 15 

30 contains 64 and 34 bp nucleotide markers. As can be 
seen in lane 14 of the gel, when the samples were 
pooled and exposed simultaneously to CEL I, the enzyme 
cleaved at all of the above listed mutations with no 
loss of specificity. Also, the PCR products of the 

35 wild-type samples showed no non-specific DNA nicking. 

To further illustrate the ability of CEL- I 
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to detect mutations in pooled DNA samples, 1, 2, 3, 5, 
10 or 30 heteroduplexed, radiolabeled PCR products, 
(again amplified from exon 2 of the BRCA1 gene) , were 
exposed to CEL-I in a single reaction tube and the 
5 products run on a 6% polyacrylamide gel containing 7M 
urea. Samples were amplified and radiolabeled as 
described above. Each pool contained only one sample 
which had a mutation (AG deletion) . The other samples 
in each pool were wild- type. Lanes 1 and 2 contain 

10 control samples which were not exposed to CEL I . In 

the pooled samples where a mutation was present, CEL-I 
consistently cleaved the PCR products illustrating the 
sensitivity of the enzyme in the presence of excess 
wild- type, non-mutated DNA (Lanes 4, 5, 6, 7, 8, 9, 

15 and 11) . As a control, heteroduplexed PCR products 

containing no mutations were analyzed and no cut band 
corresponding to a mutation appeared (Fig. 9, lanes 3 
and 10) . 

EXAMPLE X 

20 Detection of Mutations and Polymorphisms by 

CEL-I in Samples Obtained from High Risk Families 

PCR primer sets specific for the exons in 
the BRCA1 gene have been synthesized at Fox Chase 
Cancer Center. The gene sequence of BRCA1 is known. 

25 The exon boundaries and corresponding base numbers are 
shown in table II. Primers to amplify desired 
sequences can be readily designed by those skilled in 
the art following the methodology set forth in Current 
Protocols in Molecular Biology , Ausubel et al . , eds , 

30 John Wiley and Sons, Inc. (1995) . These primers were 
planned such than in each PCR reaction, one primer is 
labeled at the 5' termini with a fluorescent -label, 6- 
FAM, while the other primer is similarly labeled with 
a label of another color, TET. A PCR product will 

35 thus be labeled with two colors such that DNA nicking 
events in either strand can be observed independently 
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and the measurements corroborated. A summary of the 
results is presented in Table III. 

TABLE II 

EXON BOUNDARIES AND CORRESPONDING 
BASED NUMBERS IN BRCA1 



EXON 


BASE #'8 


1 


1 -100 


2 


101 - 199 


3 


200 - 253 


5 


254 - 331 


6 


332 - 420 


7 


421 - 560 


8 


561 - 665 


9 


666 - 712 


10 


713 - 788 


11 


789 - 4215 


11B 


789 - 1591 


11C 


1454 - 2459 


11A 


2248 - 3290 


11D 


3177 - 4215 


12 


4216 - 4302 


13 


4303 - 4476 


14 


4477 - 4603 


15 


4604 - 4794 


16 


4795 - 5105 


17 


5106 - 5193 


18 


5194 - 5273 


19 


5274 - 5310 


20 


5311 - 5396 


21 


5397 - 5451 


22 


5452 - 5526 


23 


5527 - 5586 


24 


5587 - 5711 
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Fig. 10 depicts a schematic of the exons present 
in the BRCA1 gene. Peripheral blood samples from 
individuals in high risk families were collected and 
the DNA isolated. The PCR products were amplified 
5 using Elongase (BRL) and purified using Wizard PCR 
Preps (Promega) . The DNA was heated to 94 °C and 
slowly cooled in IX CEL I buffer (20 mM Tris-HCl pH 
7.4, 25 mM KC1 , 10 mM MgCl 2 ) to form heteroduplexes . 
The heteroduplexes were incubated in 20 /xl IX CEL I 

10 buffer with 0.2 fil of CEL I and 0.5 units of AmpliTaq 
at 45°C for 30 minutes. The reactions were stopped 
with 1 mM phenanthroline and incubated for an 
additional 10 minutes at 45°C. The sample was 
processed through a Centricep column (Princeton 

15 Separations) and dried down. One microliter of ABI 
loading buffer (25 mM EDTA, pH 8.0, 50 mg/ml Blue 
dextran) , 4 /xl deionized formamide and 0 . 5 /xl TAMRA 
internal lane standard were added to the dried DNA 
pellet. The sample was heated at 90°C for 2 minutes 

20 and then quenched on ice prior to loading. The sample 
was then loaded onto a 4.25% denaturing 34 cm well -to- 
read acrylamide gel and analyzed on an ABI 373 
Sequencer using GENESCAN 672 software. The 6-FAM 
labelled primer in this experimental sample was at 

25 nucleotide 3177 of the BRCA1 cDNA (region 11D) , the 

TET labelled primer was 73 nucleotides into the intron 
between exon 11 and exon 12. Each spike represents 
the presence of a DNA band produced by the cleavage of 
the heteroduplex by CEL- I where a mutation or a 

30 polymorphism is present. One spike represents the 

size of the CEL I produced fragment from the 3' side 
of the mismatch site to the 5' 6-FAM label of the top 
strand. The other spike represents the corresponding 
fragment in the bottom strand from the 3' side of the 

35 mismatch to the 5' TET label. The sum of the two 
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fragments equals one base longer than the length of 
the PGR product. The 6-FAM panel shows a spike at 
base #645 from the 6-FAM label and the TET panel shows 
a spike at base #483 from the TET label, both 
corresponding to the site of the 5 base deletion at 
nucleotide 3819 of the BRCA1 cDNA (Fig. 11) . 

Analysis of exon 11 in another individual 
was performed using a 6-FAM- labelled primer at 
nucleotide 1454 of the BRCA1 cDNA (Fig. 12) . The TET- 
labelled primer was at nucleotide 2459 (region 11C) . 
The PCR amplified products were amplified and prepared 
as described above. In this individual, the 6-FAM 
panel shows a spike at base #700 and the TET panel 
shows a spike at #305, each spike corresponding to the 
site of CEL I incision in the respective DNA strand at 
a nonsense mutation of A>T at nucleotide 2154 of the 
BRCA1 cDNA. The 6-FAM panel also shows a spike at 
base #747 and the TET panel shows a spike at #258 
corresponding to the site of a polymorphism C>T at 
nucleotide 2201 of the BRCA1 cDNA. The nonsense 
mutation and polymorphism have been confirmed by 
sequencing of this particular sample (KO-11) using the 
ABI 377 Sequencer. Spikes that are marked with an 
asterisk are also present in the no enzyme control 
lane and represent PCR product background. 

Certain individuals have mutations in 
another region of exon 11, region 11A, on the 
schematic in Fig. 10. A 6-FAM- labelled primer at 
nucleotide 2248 of the BRCA2 cDNA and a TET labeled 
primer at nucleotide 32 90 were used to amplify this 
region of exon 11. Following amplification, the 
samples were processed as described above . The four 
6-FAM panels represent CEL- I reactions with 4 
different individual samples. The first panel in Fig. 
13A, sample #KO-2, shows one spike at #182 
corresponding to the site of a polymorphism T>C at 
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nucleotide 2430 and a second spike at nucleotide #483 
corresponding to the site of another polymorphism C>T 
at nucleotide 2731. The second panel, Fig. 13B, 
sample #KO-3, shows only the second polymorphism. The 
third panel, Fig. 13C, sample #KO-7 shows no 
polymorphism. The fourth panel, Fig. 13D, sample #KO- 
11, shows two spikes corresponding to the two 
polymorphisms. It is interesting to note that this 
sample, KO-11, shows up positive for a nonsense 
mutation and a polymorphism in the region of exon lie 
corresponding to nucleotides 1454-2459 as described 
above . 



TABLE III 

SUMMARY OF BRCA1 MUTATIONS 
AND POLYMORPHISMS DETECTED BY CEL I 



EXON 


NUCLEOTIDE 
POSITION # 


TYPE OF 
MUTATION 


2 


185 


AG deletion 


2 


188 


11 base 
deletion 


11 C 


2154 


A > T 


11 D 


3819 


5 base deletion 


11 D 


4168 


A > G 


11 D 


4153 


A deletion 


11 D 


4184 


4 base deletion 


20 


5382 


C insertion 
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EXON 


NUCLEOTIDE 
POSITION # 


TYPE OF 
POLYMORPHISM 


11 B 


1186 


A > G 


11 C 


2201 


T > C 


11 A 


2430 


T > C 


11 A 


2731 


C > T 


11 D 


3667 


A > G 



Table IV sets forth the 5' and 3' flanking 
sequences surrounding the mutations detected by CEL. I 
in the present invention. While not exhaustive, it 
can be seen from the variability of the flanking 
sequences surrounding these mutations and 
polymorphisms that CEL I sensitivity and recognition 
of mismatched DNA heteroduplexes does not appear to be 
adversely affected by flanking sequences. 
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TABLE IV 



EFFECT OF FLANKING SEQUENCES ON ENDONUCLEASE ACTIVITY OF CEL I 



nucleotide 
position 


EXON 


type of 
change 


5' flanking 
sequence 


3' flanking 
sequence 


185 


2 


AG deletion 


5'ATCTT 3' 
TAGGA 


5' AGTGT 3' 
TCACA 


188 


2 


llbp 

deletion 


5' TTAGA 3' 
AATCT 


5'G 

the next 4 bp 
are in intron 


1186 


11 B 


A--> G 


5' TAAGC 3' 


5' GAAAC 3' 

LT1G 


2154 


11 C 


A--> T 


5' GAGCC 3' 


5' AGAAG 3' 

TCTTC 


2201 


11 c 


T--> C 


5' GACAG 3' 

CTGTC 


5' GATAC 3' 
CTATG 


2430 


11 A 


T--> C 


5' AGTAG 3' 
TCATC 


5' AGTAT 3' 
TCATA 


2731 


11 A 


C--> T 


5' TGCTC 3' 
ACGAG 


5' GTTTT 3' 
CAAAA 


3667 


11 D 


A- - > G 


5' CAGAA 3' 
CTCTT 


5' GGAGA 3' 
CCTCT 


3819 


11 D 


5 bp 

deletion 


5' GTAAA 3' 
CATTT 


5' CAATA 3' 
GTTAT 


4153 


11 D 


A deletion 


5' TGATG 3' 
ACTAC 


5' AGAAA 3' 
TCTTT 


4184 


11 D 


4 bp 

deletion 


5' AATAA 3' 
TTATT 


5' GAAGA 3' 
CTTCT 


4168 


11 D 


A--> G 


5' AACGG 3' 
TTGCC 


5' CTTGA 3' 
GAACT 


5382 


20 


C insertion 


5' ATCCC 3' 
TAGGG 


5' AGGAC 3' 
TCCTG 



As can be seen from the above described 
examples, utilization of CEL I has distinct advantages 
over methods employing other mismatch repair systems 
during analysis of mutations in the clinical setting. 
These advantages are summarized in Table V. 
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EXAMPLE XI 

As noted above, many plant species 
synthesize efficient endonuclease enzymes. In 
accordance with the present invention, a novel 
5 endonuclease, ARA I has been isolated from Arabidopsis 
thaliana. This endonuclease is quite similar to CEL I 
in many respects. Arabidopsis thaliana is considered 
to provide a model system for studies in plant 
molecular biology and biochemistry. Advantages of 

10 the Arabidopsis system include a short life cycle of 
about 26 days, the small size of the plant, the 
diploid nature of the genome, and especially, the 
small size of the genome compared with most higher 
plants and animals. The Arabidopsis genome, at 7 x 10 7 

15 basepairs, is only about 10 times larger than that of 
E. coli (4 x 10 6 basepairs) , making genetic cloning of 
the mismatch endonuclease and genetic manipulation 
substantially easier than in higher plants and humans, 
both containing about 2 x 10* basepairs . Thus the 

20 finding of the mismatch endonuclease ARA I in 

Arabidopsis, and the ability to use ARA I to perform 
mutation detection, are important steps leading to the 
the application of these mismatch endonuclease in 
mutation detection. 

25 Preparation of highly purified ARA I 

The purification procedure of ARA I is very 
similar to that disclosed for CEL I. This is not 
unexpected as the data indicate that the two enzymes 
are substantially similar. 

30 Two hundred and fifty grams of callus of 

Arabidopsis thaliana ecotype Columbia were grown on 
minimal salt agar and stored frozen. The callus was 
resuspended in a Waring blender in Buffer A (0.1M 
Tris-HCl, pH 8.0 with 10 micromolar 

35 phenylmethanesulfonyl fluoride (PMSF) ) . The suspended 
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cells were broken by two passages through a French 
Pressure cell at 20,000 PSIG to produce the crude 
lysate . The crude lysate was cleared by 
centrifugation, and the supernatant was adjusted to 
5 25% saturation in ammonium sulfate with solid ammonium 
sulfate at 4°C. After two hours, the solution was 
centrifuged, and the supernatant was adjusted to 85% 
saturation in ammonium sulfate. The solution was again 
centrifuged and the pellet was dissolved in 160 ml of 

10 Buffer A containing 0.5M NaCl . Five ml of Con A resin 
was added to this solution and the mixture was tumbled 
overnight at 4° for 3 hours. The slurry was packed 
into a 2.5 cm diameter column and washed with 0 . 5M KC1 
in buffer A. The bound ARA I was eluted with about 60 

15 ml of 0.5 M alpha -methylmannoside, 0 . 5M NaCl, in 
Buffer A at 4°C. The eluted ARA I was dialyzed 
against a solution of Buffer B (25mM KP04, 10 
micromolar PMSF, pH 7.4) and applied to a 
phosphocellulose column equilibrated in Buffer B. The 

20 bound enzyme was eluted with a gradient of KC1 in 
Buffer B. The eluted peak from P-ll was dialyzed 
against buffer A and concentrated by passage through a 
column of Mono Q anion exchanger. The ARA I eluted 
from the Mono Q step was purified several 

25 thousandfold, however, the preparation was not yet 
homogeneous . 

The protein composition of the various 
purification steps was analyzed by a 4% to 20% 
polyacrylamide gradient SDS gel electrophoresis. In 

30 the gel, which is shown in Figure 14, the lanes are as 
follows: 1. extract preparation, 2. ammonium sulfate 
precipitation, 3. Con-A Sepharose affinity column 
chromatography, 4. phosphocellulose P-ll 
chromatography; and 5. final purification over a DEAE 

35 Sephacel anion exchange column give rise to a over a 

10,000 fold purification of ARA I. Protein in the gel 
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was visualized with staining with Coomassie Blue R- 
250. 

PCR products were amplified using Amplitaq 
(Perkin-Elmer) and purified using Wizard PCR Preps 
5 (Promega) . The DNA was heated to 94 °C and slowly 

cooled in IX ARA I buffer (20 mM Tris-HCl, pH 7.4, 25 
mM KC1, 10 mM MgCl 2 ) to form heteroduplexes . The 
heteroduplexes were incubated in 20 pi IX ARA I buffer 
with 0.2 pi ARA I (0.01 /xg) and 0.5 units of Amplitaq 

10 (Perkin-Elmer) at 45°C for 30 minutes. The reactions 
were stopped with 1 mM phenanthroline and incubated 
for an additional 10 minutes at 45 °C. The samples 
were processed through a Centricep column (Princeton 
Separations) and dried down. One microliter of ABI 

15 loading buffer (25 mM EDTA, pH 8.0, 50 mg/ml Blue 
dextran) 4 /il deionized formamide and 0.5 (il TAMRA 
internal lane standard were added to the dried DNA 
pellet. The sample was heated at 90°C for 2 minutes 
and then quenched on ice prior to loading. The sample 

20 was then loaded onto a 4.25 % denaturing 34 cm well- 
to-read acrylamide gel and analyzed on an ABI 373 
Sequencer using GeneScan 672 Software. The vertical 
axis of the electropherogram is relative fluorescence 
units. The horizontal axis of the electropherogram is 

25 DNA length in nucleotides. 

Throughout purification, the presence of 
mismatch endonuclease activity catalyzed by ARA I was 
readily observable. Figure 15 shows an autoradiogram 
of denaturing DNA sequencing gel analysis of ARA I -cut 

30 mismatched substrates. The lanes in the gel 

correspond to the lanes containing the various stages 
of purified ARA I shown in Figure 14. In Figure 15, 
Panels A, B, C illustrate the ARA I cutting of 
substrate #2 with the extrahelical G nucleotide, 

35 substrate #4 with the extrahelical A nucleotide, and 
substrate #18, a no-mismatch control substrate, 
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respectively. F refers to full length substrates 
whereas I refers to ARA I cut substrates which 
produced a fragment 35 nucleotides long. 

5 Preparation of GeneScan targets 

Genescan targets were prepared as described 
for the CEL I studies. 

Peripheral blood samples from individuals at 
high risk for breast /ovarian cancer were collected and 

10 the DNA isolated used as PCR templates. PCR primers 
specific for the exons in the BRCA1 gene were 
synthesized with a 6-FAM dye at the 5' end of the 
forward primer, and a TET dye at the 5' end of the 
reverse primer. PCR products for an exon were 

15 hybridized to form heteroduplexes and reacted with ARA 
I. The products were resolved by the ABI 373 
Automated DNA Sequencer and analyzed by GeneScan 672 
software . 

A schematic diagram of the ARA I mismatch 

20 detection assay is shown in Figure 16. The PCR 

products of a wild type BRCA1 allele and a mutant 
BRCA1 allele (with an AG deletion in this example) 
were mixed. After denaturation by heat, and 
reannealing, heteroduplexes were formed such that in 

25 some of them, the extra AG bases formed a loop in the 
top strand. In others, the extra CT bases formed a 
loop in the bottom strand. The looped strands were 
color labeled by having a color dye marker at the 5' 
termini of each PCR primer used to create the DNA 

30 fragment. ARA I cuts the loops at the 3' side of the 
mismatch, similar to the mismatch cutting by CEL I. As 
a result, a truncated blue (6-FAM) band and a 
truncated green (TET) band are produced. The lengths 
of these two bands independently pin point the 

3 5 location of the mutation in the fragment. 

An illustration of the data obtained from 
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GeneScan analysis based ARA I cutting of a 
heteroduplex containing a mismatch is shown in Figure 
17. The resolution of the ARA I treated DNA fragments 
is done in a denaturing polyacrylamide gel in an 
5 automated DNA sequencer Model 373 of Perkin Elmer. In 
the denaturing polyacrylamide gel, the fastest running 
fluorescent signals are the residual PCR primers. The 
two ARA I fragments, one blue {6-FAM) and one green 
(TET) follow at their respective size positions. The 

10 slowest migrating band is the uncut full length PCR 
product. These bands are displayed by the GeneScan 
software as peaks in the fluorogram at the bottom of 
Figure 17. The GeneScan software, using molecular 
weight standards of red color (TAMRA) run within the 

15 same lane, identifies the sizes of the blue band and 
the green band. The length of the ARA I generated 
fragments from the two colored ends of the PCR product 
independently pinpoint the location of the mutation. 

Figures 18-21 provide a simultaneous 

20 comparison of Genescan mutation detection of ARA I 

versus CEL I, using the same BRCA1 gene PCR products 
described in the previous examples. As can be seen 
from the data, CEL I and ARA I appear to have 
identical enzymatic activity on the substrates tested. 

25 Thus ARA I, like CEL I is an important new 

endonuclease that can be used to facilitate the 
identification of mutations in DNA. As such, the 
enzyme provides a valuable addition to the arsenal of 
reagents utilized in genetic screening assays. 

3 0 EXAMPLE XII 

In support of the claim that the mismatch 
endonucleases in Arabidopsis , and indeed in most 
plants, are substantially similar to that claimed for 
CEL I of celery, the data for the mismatch detection 
35 by the extracts of 10 plants, besides Arabidopsis, is 
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presented. The indicated plant extracts, namely, 
alfalfa, mung bean, cabbage, cauliflower, Cha ha, 
lettuce, parsley, celery- cabbage, tomato and broccoli 
were made by homogenizing one part plant with one part 
buffer A in a Waring blender. As shown in Figure 22, 
one microliter of the crude homogenate was assayed 
with top strand 5' 32 P labeled substrate #4 
(extrahelical A substrate) . Lanes G+A and T are Maxam 
and Gilbert DNA sequencing ladders used to determine 
the exact cut site in the top strand. The mismatch 
endonuclease cut produced a 35 nucleotide long 
fragment visible in lanes 1 to 11. The activity was 
seen in root, shoot, stalk, leaves, flower, and fruit 
of these plants, illustrating its ubiquitous nature. 
Lanes 12-22, which correspond to lanes 1-11, serve as 
negative controls using no-mismatch substrate #1. Cha 
ha is a Vietnamese vegetable that resembles celery 
stalk and is rich in nucleases. 

To further illustrate the ubiquitous nature 
of mismatch endonuclease activity similar to ARA I and 
CEL I, extracts of a group of 11 plants (including all 
of the plants from Figure 22, plus asparagus) were 
analyzed for enzymatic activity. See Figure 23. The 
plant extracts identified in Figure 22 were used to 
cut a mismatch substrate #2, (extrahelical G 
substrate) . One microliter of the crude homogenate was 
assayed with top strand 5' 32 P labeled substrate #2. 
Lanes G, G+A, C, and T are Maxam and Gilbert DNA 
sequencing ladders used to determine the exact cut 
site in the top strand. The mismatch endonuclease cut 
produced 34 and 3 5 nucleotide long fragments visible 
in lanes 1 to 17 . Two bands were seen for the 
mismatch cut because of mismatch slippage at the two 
consecutive G residues in the mismatched substrate. 
The activity was seen in root, shoot, stalk, leaves, 
flower, and fruit of these plants. Lanes 12-22 
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correspond to lanes 2-11, but with the no-mismatch 
substrate #1. Lanes 14-17 illustrate that the mismatch 
endonuclease activities from four of the plants are 
shown to be mannosyl proteins similar to CEL I and ARA 
I. These activities were bound to a ConA-Sepharose 
resin and then eluted with mannose buffer. Lanes 18-21 
are controls of lanes 14-17 except substrate #1, with 
no mismatch, was used. It is clear from Figures 22 and 
23 that the CEL I and ARA I mismatch endonucleases are 
substantially similar and conserved among plants both 
in terms of mismatch cutting ability, abundance of 
activity, and the mannosyl nature of the 
glycoproteins . 

The description and examples set forth above 
15 relate to preferred embodiments of the invention. 

Other embodiments may be apparent to those skilled in 
the art. Therefore, the invention is not limited to 
the particular embodiments described and exemplified, 
but may be capable of modification or variation 
20 without departing from the spirit of the invention, 

the full scope of which is delineated by the appended 
claims . 
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WHAT IS CLAIMED IS: 

1. A method for determining a mutation in 
a target sequence of single stranded polynucleotide 
with reference to a non-mutated sequence of a 
polynucleotide that is hybridizable with the 
polynucleotide including said target sequence, wherein 
said sequences are amplified, labeled with a 
detectable marker, hybridized to one another, exposed 
to endonuclease and analyzed for the presence of said 
mutation, the improvement comprising the use of a 
mismatch endonuclease, the activity of said 
endonuclease comprising: 

a) detection of all mismatches between 
said hybridized sequences; 

b) recognition of sequence differences 
in polynucleotide strands between about lOObp and 
about 3kb in length; and 

c) recognition of said mutation in a 
target polynucleotide sequence without substantial 
adverse effect caused by flanking polynucleotide 
sequences . 

2. The method as claimed in claim 1 wherein 
said endonuclease is derived from celery. 

3 . The method as claimed in claim 1 
25 wherein said polynucleotide is DNA. 

4. The method as claimed in claim 2 
wherein the sequences exposed to said endonuclease are 
also exposed to a protein selected from the group 
consisting of DNA ligase, DNA polymerase, DNA 

30 helicase, 3' -5' DNA Exonuclease, DNA binding proteins 
that bind to DNA termini or a combination of said 
proteins, thereby reducing non-specific DNA cleavage. 
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5. The method as claimed in claim 1 
wherein the sequences exposed to said endonuclease are 
also exposed to DNA polymerase. 

6 . The method as claimed in claim 2 

5 wherein target DNA is analyzed on a multiplex grid. 

7. The method as claimed in claim 2 
wherein said polynucleotide is cDNA. 

8. The method as claimed in claim 1, 
wherein said sequences are analyzed on a DNA 

10 sequencing gel thereby identifying the location of the 
mutation in a target DNA strand relative to DNA 
sequencing molecular weight markers. 

9 . The method as claimed in claim 1 wherein 
said mutation is determined as a means of genetic 

15 screening for cancer. 

10. The method as claimed in claim 1 
wherein said mutation is determined as a means of 
detecting of genetic alterations leading to birth 
defects . 

11. A method for determining a mutation in 
a target sequence of single stranded polynucleotide 
with reference to a non-mutated sequence of a 
polynucleotide that is hybridizable with the 
polynucleotide including said target sequence, wherein 
said sequences are amplified, labeled with a 
detectable marker, hybridized to one another, exposed 
to endonuclease and analyzed for the presence of said 
mutation, the improvement comprising the use of a 
mismatch endonuclease derived from celery, the 
activity of said endonuclease comprising: 
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a) detection of all mismatches between 
said hybridized sequences; 

b) recognition of sequence differences 
in polynucleotide strands between about lOObp and 

5 about 3kb in length; 

c) recognition of said mutation in a 
target polynucleotide sequence without substantial 
adverse effect caused by flanking polynucleotide 
sequences ; 

10 d) recognition of polynucleotide loops 

and insertions between said hybridized sequences; and 

e) recognition of said mutation in a 
target polynucleotide sequence in which a DNA 
polymorphism or another mutation is also present. 

15 

12 . A mismatch endonuclease for determining 
a mutation in a target sequence of single stranded 
mammalian polynucleotide with reference to a non- 
mutated sequence in a polynucleotide that is 

20 hybridizable with the polynucleotide including said 
target sequence, said endonuclease being in 
substantially pure form and effective to: 

a) detect all mismatches between said 
hybridized sequences; 

25 b) recognize sequence differences in 

polynucleotide strands about 100 bp and about 3 kb bp 
in length; 

c) recognize said mutations in a 
target polynucleotide sequence without substantial 
30 adverse effect caused by flanking DNA sequences; and 

13 . The endonuclease of claim 12 wherein 
said endonuclease recognizes polynucleotide loops and 
insertions between said hybridized sequences. 
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14 . The endonuclease of claim 12 wherein 
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said endonuclease recognizes said mutation in a target 
polynucleotide sequence in which a DNA polymorphism or 
another mutation is also present. 

15. The endonuclease of claim 12 derived 
5 from a plant source. 

16. The endonuclease of claim 15, wherein 
said plant source is celery. 



17. The endonuclease of claim 15, wherein 
said plant source is Arabidopsis thabiana. 
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